learning theory - bias-variance tradeoff

some terminology in learning theory.

看learning theory的时候，关注的是这个hypothesis怎么样，而不是”specific parameterization of hypotheses or whether it is linear classification”
so we define hypothesis class H

training error/empirical risk/empirical error of hypothesis h
- training set have size N,
- assumption 1: (one of PAC assumption)
  - training examples $(x^{(i)},y^{(i)})$ are drawn iid from some probability distribution D
    $\hat{e}(h) = 1/N \sum_{i=1,2…N} (e_i)$
    then we can have
generalization error
- DEF: under assum1, it is the probability that, if we now draw a new example (x, y) from the distribution D, h will misclassify it.
- have two component: bias and variance
training error
- the process of minimizing training error: empirical risk minimization (ERM)
  - think of ERM as the most “basic” learning algorithm
  - logistic regression is approxi of ERM
expected train error
- by taking expectation over all possible training datasets of size N.
- which is means we train for infinite datasets and take the average, but we cant, so we estimate by say, having m training datasets in size N then avarage the training error of each set
in-sample test error
- for one given test-pair
test error
- taking expectation over the test-data(all in-sample test error)
expected test error
- average over all possible training data of size N, again we cant, so esitimate it by our limited test set